NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:
Using only NumPy, perform the following.
In [1]:
import numpy as np
1 - Create an array of the form $b = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$
In [2]:
b = np.arange(1, 3).reshape(2, 1) # Using reshape
#b = np.arange(1, 3)[:, np.newaxis] # Using newaxis
print(b)
2 - Create a 2x2 array of the form $X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$
In [3]:
X = np.arange(1, 5).reshape(2, 2)
print(X)
3 - Multiply the two arrays element wise, then using matrix multiplication, and finally the inner product of $b \bullet b$.
In [4]:
prod1 = X * b # element wise
prod2 = X.dot(b) # matrix multiplication
prod3 = b.reshape(2).dot(b.reshape(2)) # inner product
print('Element wise product:\n', prod1)
print('Matrix multiplication:\n', prod2)
print('Inner product:\n', prod3)
4 - For each of your results in part (3), print the shape and data type.
In [5]:
for pr in [prod1, prod2, prod3]:
print('-----------------------------')
print('Shape: {}'.format(pr.shape))
print('Data type: {}'.format(pr.dtype))
5 - Reshape (or flatten), the array $X$ such that it consists of only 1 row.
In [6]:
print(X.flatten())
6 - Create an array of the integers 1 to 10, inclusive, setting the datatype to float.
In [7]:
print(np.arange(1, 11, dtype='float'))
7 - Create a 10x10 identity matrix using the built in numpy function.
In [8]:
print(np.eye(10))
8 - Create a 10x10 identity matrix using a for loop.
In [9]:
identity = np.zeros((10, 10))
for i in range(0, 10):
for j in range(0, 10):
if i == j:
identity[i, j] = 1
print(identity)
9 - Generate a set of random data, $X$, drawn from a normal distribution, consisting of 9 columns of 100 rows each, then attach a column of all ones, resulting in a 100x10 matrix for $X$. Next generate a random array $\beta$, drawn from a uniform distribution, of length 10. Also make an array, $\epsilon$ of length 100, drawn from a normal distribution. Finally, compute a vector $\vec{y}$ such that $\vec{y} = X\beta + \epsilon$. Be sure to set the random seed to 0 before drawing any random numbers. All random numbers should be on the interval [0, 1).
In [10]:
# Set seed to 0
np.random.seed(0)
# Random matrix from normal distribution + column of ones
X = np.random.normal(size=(100, 9))
X = np.hstack([X, np.ones(100).reshape(100, 1)])
#print('Shape of X:', X.shape)
# Random array from uniform distribution
np.random.seed(0)
β = np.random.uniform(size=10) # defaults are ok for random number interval
# Random array from normal distribution
np.random.seed(0)
ϵ = np.random.normal(size=100)
# Compute y
y = X.dot(β) + ϵ
y
Out[10]:
10 - Using the vector $\vec{y}$ computed in part 9, create a vector $\vec{c}$ containing the labels "positive" or "negative" for each value in $\vec{y}$, treating 0 as positive. Bonus: Do it with a one-liner.
In [11]:
c = np.array(['positive' if el else 'negative' for el in y >= 0])
print(c)
11 - Using the classes generated in part 10, separate the matrix X into two smaller matricies, $X_p$, $X_n$, containing only rows which map to positive or negative values respectively.
In [12]:
Xp = X[c == 'positive', :]
print(Xp.shape)
Xn = X[c == 'negative', :]
print(Xn.shape)
print(X[0] == Xp[0])
print(X[5] == Xn[0])
print(X[-1] == Xp[-1])
print(X[-2] == Xn[-1])
12 - Generate a meshgrid on the interval [0, 1], of shape 100x100. Then compute the Euclidean Distance given by $d = \sqrt{x^2 + y^2}$ from the origin for each unit, $(x_n, y_n)$, in the grid. Bonus: Do it with a one-liner.
In [13]:
x, y = np.meshgrid(np.linspace(0, 1, 100), np.linspace(0, 1, 100))
d = np.sqrt(x**2 + y**2)
d2 = np.fromfunction(lambda i, j: np.sqrt((i/99)**2 + (j/99)**2), (100, 100), dtype=float)
print(d[-1, -1])
print(d2[-1, -1])
13 - Generate a set of 100 values, $p$, on the interval $[0, 2\pi]$ and two vectors, $\vec{x}, \vec{y}$ such that $\vec{x} = cos(p)$ and $\vec{y} = sin(p)$. Then compute the vector $\vec{r} = \sqrt{x^2 + y^2}$. Comment on your results.
In [14]:
p = np.linspace(0, 2*np.pi, 100)
x = np.cos(p)
y = np.sin(p)
r = np.sqrt(x**2 + y**2)
print(r[(r < 0.9999999) | (r > 1.0000001)])
14 - Generate two lists, a
, b
, consisting of 10 randomly drawn values from a normal and uniform distribution respectively. Compute the mean and median of each.
In [15]:
a = np.random.normal(size=10)
b = np.random.uniform(size=10)
print('Normal distribution mean: {:.5f}\nNormal distribution median: {:.5f}'.format(a.mean(), np.median(a)))
print('Uniform distribution mean: {:.5f}\nUniform distribution median: {:.5f}'.format(b.mean(), np.median(b)))
15 - Using a
from part 14, create a new list c
by calling a = c
. Now change the shape of c
. Comment on your results.
In [16]:
c = a
c.reshape(5, 2)
print(c.shape)
print(a.shape)
16 - How would you solve the problem that appeared in part 15?
In [17]:
c = a
c = c.reshape(5, 2)
print(c.shape)
print(a.shape)